fix(h2): return Poll::Pending when poll_capacity is not ready in UpgradedSendStreamTask#4050
fix(h2): return Poll::Pending when poll_capacity is not ready in UpgradedSendStreamTask#4050abbshr wants to merge 1 commit intohyperium:masterfrom
Conversation
…adedSendStreamTask Fix a backpressure bypass bug in UpgradedSendStreamTask::tick() where poll_capacity() returning Poll::Pending caused a 'break 'capacity' that fell through to rx.poll_next() -> send_data(), pushing data into the h2 send buffer without available capacity. This broke the HTTP/2 flow control chain, causing unbounded memory growth (OOM) when downstream consumers were slower than upstream producers. The fix changes 'break 'capacity' to 'return Poll::Pending', which correctly suspends the task until a WINDOW_UPDATE frame restores send capacity. The now-unused 'capacity label is also removed. This bug was introduced in hyper v1.8.0 (PR hyperium#3967) and affects v1.8.0, v1.8.1, and v1.9.0. A single HTTP/2 CONNECT tunnel with asymmetric upstream/downstream speeds could trigger OOM within seconds. Add four integration tests covering H2 CONNECT backpressure scenarios: - h2_connect_backpressure_respected: small window + large data transfer - h2_connect_zero_window_then_release: normal path regression guard - h2_connect_reset_during_backpressure: RST_STREAM error propagation - h2_connect_backpressure_bidirectional: bidirectional data + backpressure
|
cc @seanmonstar |
| ))); | ||
| } | ||
| Poll::Pending => break 'capacity, | ||
| Poll::Pending => return Poll::Pending, |
There was a problem hiding this comment.
The comment from L71-L74 I think is relevant to why this previously did not return early. We want to make sure the waker is registered with each of the futures, so that if one side "cancels", the task can clean up quickly.
- We want to notice when capacity has become available.
- Or when the remote has sent a RST_STREAM (or other error)
- Or when our bytes sender (on the
me.rxside) has closed and no longer expects to send more data.
Said another way, if we're waiting for capacity, and the user drops the Upgraded type (meaning they no longer want to write), this UpgradedSendStreamTask will not notice and will hang around until capacity is eventually given (if the peer ever gives it), and only then hang up.
I get what you're trying to do, but I think the types or channels would need to adjusted a little to handle those cases.
There was a problem hiding this comment.
Thanks for replying.
I re-read the comments and the code impls. As the comments said, there are three sub task within tick: h2_tx.poll_capacity() h2_tx.poll_reset(), rx.poll_next().
I agree with the case as you said: "when the remote has sent a RST_STREAM (or other error)" should be noticed as soon as possible, because it relate to h2 context semantic, if h2 chan no longer to work, the whole task should be dropped.
But rx.poll_next(), I think it's the other half of the whole transaction, and comes after the h2: if no write operations on h2 chan are permitted, no poll() should be performed on the rx chan.
So I think the modify should be something like this:
// check capacity
let h2_has_capacity = loop {
match me.h2_tx.poll_capacity(cx) {
...
// just break the loop return no capacity flag
Poll::Pending => break false,
}
}
// handle the h2_tx RST_STREAM case
match me.h2_tx.poll_reset(cx) {
....
}
if !h2_has_capacity {
return Poll::Pending;
}
// handle rx side poll data
match me.rx.as_mut().poll_next(cx) {
Poll::Ready(Some(cursor)) => {
me.h2_tx
.send_data(SendBuf::Cursor(cursor), false)
.map_err(crate::Error::new_body_write)?;
}
Poll::Ready(None) => {
me.h2_tx
.send_data(SendBuf::None, true)
.map_err(crate::Error::new_body_write)?;
return Poll::Ready(Ok(()));
}
Poll::Pending => {
return Poll::Pending;
}
}Correct me if I'm wrong with it.
Fix #4049
Fix a backpressure bypass bug in UpgradedSendStreamTask::tick() where poll_capacity() returning Poll::Pending caused a 'break 'capacity' that fell through to rx.poll_next() -> send_data(), pushing data into the h2 send buffer without available capacity. This broke the HTTP/2 flow control chain, causing unbounded memory growth (OOM) when downstream consumers were slower than upstream producers.
The fix changes 'break 'capacity' to 'return Poll::Pending', which correctly suspends the task until a WINDOW_UPDATE frame restores send capacity. The now-unused 'capacity label is also removed.
This bug was introduced in hyper v1.8.0 (PR #3967) and affects v1.8.0, v1.8.1, and v1.9.0. A single HTTP/2 CONNECT tunnel with asymmetric upstream/downstream speeds could trigger OOM within seconds.
Add four integration tests covering H2 CONNECT backpressure scenarios: